From: route@monster.com
Sent: Tuesday, June 04, 2013 3:54 PM
To: hg@apeironinc.com
Subject: Please review this candidate for: Big Data
This resume has been forwarded to
you at the request of Monster User xapeix01
|
|||||||
|
|||||||
|
|
|
||||||
|
||||||
|
Bin Yu, Ph. D. San
Jose, CA 95148 (408)
623-2346 binyumail@gmail.com http://divinev.com OVERVIEW
·
More than 10 years of successful track records in design and
development of large scale distributed systems for big data analytics and
internet search. Strong hands-on experience in text analytics, machine
learning, data mining, and large scale distributed system architecture. ·
Demonstrated leadership in technical project and organizational
resource management by leading multiple development teams onshore and
offshore and delivering high quality products. TECHNICAL
EXPERTISE
·
Deep knowledge in Hadoop ecosystems including HDFS, MapReduce, HBase,
Sqoop, Flume, Pig and Hive. ·
Architect level hands-on programming skill in Java, C++, Perl and SQL. ·
Many years experience in big data mining, text analytics, machine
learning, internet search, large scale distributed system infrastructure,
middleware and tooling. WORK EXPERIENCE
2011-Date: Rearden Commerce Head of Big Data Analytics
Engineering, CTO Organization ·
Assumed increasing responsibility for architecture design and system
development of analytics platform and infrastructure to facilitate big data
analytics and relevance engine for business intelligence and ecommerce
recommendation. Led the effort of building the analytics infrastructure from
scratch with Hadoop and its ecosystems including HBase, Mahout, Pig, Hive, as
well as AWS cloud technologies. ·
Leading the projects of building and productizing the algorithms of
data mining, machine learning for ecommerce personalization in travel
booking, deal offer, and B2B e-commerce. ·
Managing the research and development of sentiment analysis using
natural language processing technology based on OpenNLP and WordNet. ·
Responsible for data collection from internal and external sources
including Oracle, MySQL, popular social network and review sites and
integration with data warehouse. ·
Building the Analytics Engineering team and responsible for managing
and maintaining the Hadoop cluster to support MapReduce, Pig, and Hive by
continuously adding functional modules and UDF libraries. 2006-2011: Ask.com (competitor
of Google) Head of Search Metrics and
Infrastructure, Search Technology ·
Building and managing the distributed technology teams onshore and
offshore with increasing performance and responsibility. Led the team that
built HBase like proprietary nosql database based on Google’s BigTable. ·
Assumed broad responsibility in development of new generation of
internet search engine infrastructure including middleware, logging system,
common services, and tools. Achieved performance breakthrough for core
components which leads to an end-to-end throughput of half billion pages per
day. ·
Maintained the leadership in a variety of advanced technologies in
search engine business. Restructured and streamlined the process for product
development of classification, categorization, and ranking algorithm based on
machine learning and data mining techniques, and achieved continuous user
experience improvement. ·
Led the projects of large scale data analytics for business
intelligence, user behavior analysis, and monetization opportunity discovery.
The achievement also includes better advertising performance and improved SEM
models. 2002-2006: Hologic Senior Manager, Advanced
Technology ·
Managed a group of scientists and software engineers in advanced
technology department. Greatly lifted the company’s competency by
continuously maintaining its leading position in breast cancer detection against
competitors. ·
Assumed responsibilities for product specification, prioritization,
scheduling, algorithm project management and system architecture design. ·
Managed all aspects of the development of medical data mining, machine
learning and neural network algorithms for structured and unstructured
medical data analysis and disease diagnosis.
2002-2002: Rational Software
(part of IBM) Project Lead, Enterprise Software ·
Led the design and development of ClearQuest charting and reporting
components for its new generation SaaS solution. 1999-2002: Akamai Technologies Tech Lead, Streaming Product ·
Responsible for the architecture design and system development of the
scalable distributed streaming services. Built the first internet service of
web streaming casting for corporate communication. ·
Led the design and development of the decentralized web publishing
system, encoder automation system, logging and monitoring system with high
degree of scalability, fault tolerance and reliability. 1997-1999: Electroglas Staff Engineer, Computer Vision ·
Designed and developed the machine learning algorithms for defect
detection that is one of the key components for successfully building the new
generation of products. EDUCATION
·
Postdoctoral Research Fellow, Computer Science, Michigan State
University. ·
Ph.D., Electronic Engineering, Tsinghua University. ·
MS, Biomedical Engineering, Tianjin University. PUBLICATIONS
·
CMEIAS: A computer-aided system for image analysis of bacterial
morphotypes in microbial communities, Microbial Ecology. ·
Automatic text location in images and video frames, Pattern
Recognition. ·
Document representation and its application to page decomposition, IEEE
Transactions on Pattern Analysis and Machine Intelligence. ·
A generic system for form dropout, IEEE Transactions on Pattern
Analysis and Machine Intelligence,. ·
A robust and fast skew detection algorithm for generic documents, Pattern
Recognition. ·
A consistent attribute graph-based hand drawn circuit diagram reading
system, Chinese Journal of Electronics. ·
A global optimum clustering algorithm, Engineering Applications of
Artificial Intelligence. ·
The image contour extraction of engineering drawings and its
applications to recognizing hand writing characters, Journal of Northern
Jiaotong University. ·
A more efficient branch and bound algorithm for feature selection, Pattern
Recognition. ·
A dynamic selection algorithm for globally optimal subset, Engineering
Applications of Artificial Intelligence. ·
BF** algorithm for feature selection and its comparison with BF*
algorithm, Acta Electronica Sinica. ·
Isothetic polygon representation for contours, CVGIP: Image
Understanding. ·
The tree representation of the graph used in binary image processing, Information
Processing Letters. ·
Representation of LAG structure used in binary image processing with
extended binary tree, Chinese Journal of Computers. ·
BAG-based vectorization and its application to recognizing hand-drawn
logic circuit diagrams, Acta Electronica Sinica,. ·
The image boundary tracing and its application to the recognition of
hand-written characters, Acta Electronica Sinica. ·
NMR medical image analysis in high noise, Chinese Journal of
Medical Instrumentation. ·
CMEIAS: Center for microbial ecology image analysis system, in Proceedings
of the 8th International Symposium on Microbial Ecology, Halifax, Canada. ·
Automatic text location in images and video frames, in Proceedings
of the 14th International Conference on Pattern Recognition, Brisbane. ·
Model-based document representation: application to page segmentation,
in Proceedings of the 4th
International Conference on Document Analysis and Recognition, Ulm. ·
Address block location on complex mail pieces, in Proceedings of
the 4th International
Conference on Document Analysis and Recognition, Ulm. ·
Lane boundary detection using a multiresolution Hough transform, in Proceedings
of the IEEE International Conference on Image Processing, Santa Barbara. ·
A form dropout system, in Proceedings of the 13th International Conference on Pattern
Recognition, Vol. 3, Vienna. ·
Document processing research in Michigan State University, in Proceedings
of the Symposium on Document Image Understanding Technology, Maryland. ·
Automatic understanding of symbol connected diagrams, in Proceedings
of the 3rd
International Conference on Document Analysis and Recognition, Montreal. ·
A feature selection method for multi-class-set classification, in Proceedings
of the IEEE International Joint Conference on Neural Network, Vol. 3,
Baltimore. ·
The extended binary tree representation of binary image and its
application to engineering drawing entry, in Proceedings of the 10th IEEE International Conference on
Pattern Recognition, Atlantic. ·
An economical contour extraction algorithm for understanding
large-size engineering drawings, in Proceedings of the 1st IEEE International Conference on
Systems Integration, Morristown. ·
A BAG-based vectorizer for automatic diagram reader, in Proceedings
of the International Conference on CAD & CG, Beijing. ·
The data structure used in image processing and its application to
OCR, in Proceedings of the 4th
National Conference on Image Science. ·
Nuclear magnetic resonance medical imaging with low field, in Proceedings
of the 9th IEEE Annual
Conference of the Engineering in Medicine and Biology Society, Boston. KEYWORDS
·
Hadoop, HBase, MapReduce, Pig, Hive, Flume, Sqoop, Mahout, Nutch,
Lucene, Solr, MySQL, Data Warehouse, WordNet, OpenNLP ·
Classification, Large Scale Distributed Systems, Data Mining, Data
Analytics, Machine Learning, Information Retrieval, Natural Language
Processing ·
Big data analytics, Internet Search, Cloud Computing, Amazon AWS EC2 ·
Java, C++, Perl, REST, Servlet, JSP, XML, JSON, Web Services, J2EE,
SaaS, Tomcat and Application Server ·
Linux, TCP/IP, HTTP, Eclipse, Scrum, Agile |
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||